scav_toolsfandomcom-20200213-history
Perceptual
Diagram The perceptual coder with a fixed bitrate implemented is very similar to the Bands, but using auditory masking concepts to mask quantization noise. Therefore, its pipeline is: a PCM sound vector (read from a wav file), it is enframed in order to obtain chunks of the sound in which it will compute the MDCT and the FFT, then it creates an approximation of the masking spectrum for each frame, allocate bits using the SMR in frequency SPL of each frame and use this allocation to encode bands of the MDCT of the frame depending on their energy; this MDCT frames are written in a .perc file and can be sent into a decoder to obtain a 16 bit playable wav (this would substitute an audio player that would play the file in real time). Coder One of the main functions of this module is the coder (function perceptual in Perceptual.py). This function receives a path to a wav file (originalFile), the bitrate it should achieve (bitrate), the frame length (N) and the output name (codedFile). This values are set to default to make quick tests (originalFile is 'drumsA.wav' in the sounds directory, bitrate is 128000, N is 1024, and codedFile is 'yourfile.perc). First it creates an Output directory, in case there is no folder named like that, and adapts the output to be saved under this directory. Then informs the user through the terminal that the file has started to be encoded, defines some variables and reads the wav input file 'originalFile' using the function wavread. Then, in order to measure the dB SPL it computes the max_fft (96 dB) as the maximum of the fft of a pure tone of 1000Hz. It enframes the signal with the enframe method explained before and obtains the number of frames generated in order to later iterate through them. It opens the stream and the file to write the header with the following information needed to decode the body data of the file: sampling frequency, frame length, bitrate (although it is not necessary), scale bits for the gain and number of frames. Then for every frame it encodes it by doing the following: first selects the frame from the matrix of the enframed sound, it gets half of its FFT and computes its magnitude. It initialises the gain vector and the bit allocation vector. Then, if the frame has energy it allocates the bits by using the half FFT in SPL of the frame. To allocate the bits it uses the function allocate from the utilFunctions module. In this case, compared to the last coder (bands), it uses the Signal to Masker Ratio (SMR) as the reference that allocate must use to give bits to each band. The SMR is computed taking the threshold in quiet that is similar to a equal-loudness contour 8, that is an approximation of the minimum that our hearing can hear; and adding to this threshold in quiet a masking curve with the Schroeder function for each spectral peak found with the function peakDetection from utilFunctions. This combination process creates an masking threshold 9 and then we subtract it from the spectrum of the frame to calculate the SMR. Therefore this will result in using the frequential auditory masking to mask the quantization noise. This is because for each bit used to quantize a band, it reduces the noise of that band 6 dB. Therefore no quantization noise is achieved by assigning as much bits that will make the noise 0 to each band. In this model, we consider that the noise won't be heard if we use the bits to avoid noise in the SMR, therefore the number of bits needed to mask the quantization noise of each band is the number of the bits needed to mask the maximum SMR of the band. Given this allocation of bits depending on the energy of each band in the frame, we use the utilFunctions method p_encode to quantize the MDCT of the frame and obtain a gain factor as we explained before. Once we have the bits allocated, the quantized MDCT and the gain, we write this values in the file; first the gain, then the bit allocation array and finally for each band we write its quantized values only if there were any bits allocated to it. Every 10 frames, an information about the encoding status should be displayed in the terminal. When it has finished encoding it also displays information. The coder ends flushing the stream, closing the file and returning the path to the coded file. Bitstream The bitstream format of the .perc files is composed by a header with information needed to decode and then a body with the quantized frames and information needed to decode each frame. Header: * 16 bits to represent the sampling frequency. * 12 bits to represent the length of each frame (N). * 19 bits to represent the bitrate. * 4 bits to represent the scale bits of the gain. * 26 bits to represent the number of frames of the file. Total number of bits for the header: 77 bits. ---- Body: For each frame: * bands*(scale bits) bits to represent the gain array. * bands*4 bits to represent the bit allocation array. * For each band: ** (bit allocation of the band)*(length of the band) bits for the quantized MDCT of the band. Note: If the sampling frequency is 44100 Hz, then the number of bands is 25. Total number of bits for the body changes a lot depending on the parameters Total number of bits for the file: 77 bits of the header + bits of the body + padding bits. Decoder The second main function of the Perceptual Coder is the decoder (function percDecoder in Bands.py). This function receives a path to a .perc file as an input parameter. The decoder checks if the file exists and creates the Output directory if it does not exist. Adapts the output path and prints information. Defines the Birdie Reduction Constant (that will be used later) and then reads the header to obtain the useful variables to decode the file. It prints more information and then starts decoding the frames. To decode the frames, first reads the gain factor, then the bit allocation. Initialises certain values that will be needed later and starts reading the values of the MDCT bands of the frame (if there were bits allocated, if else it just sets them to zero) Once the values from the frame have been read from the file, it dequantizes them (unless the value is zero) and the bit allocation of the band is zero. Then applies the gain and applies the birdie reduction explained before. It finally applies the inverse MDCT of the decoded MDCT frame and saves it for joining the frames later. Once all the frames have been decoded and stored in a vector, some information is printed and then the frames are joined by overlapping frames as they were divided. As, again there was no windowing in the enframing (no triangular, whatsoever) the sound is normalized to obtain values between -1 and 1. It finally writes the sound into a wav file using utilFunction's wavwrite, informs through the terminal that the decoding has been finished and returns the path to the coded file.